A serpin gene from a parasitoid wasp disrupts host immunity and exhibits adaptive alternative splicing

Alternative splicing (AS) is a major source of protein diversity in eukaryotes, but less is known about its evolution compared to gene duplication (GD). How AS and GD interact is also largely understudied. By constructing the evolutionary trajectory of the serpin gene PpSerpin-1 (Pteromalus puparum serpin 1) in parasitoids and other insects, we found that both AS and GD jointly contribute to serpin protein diversity. These two processes are negatively correlated and show divergent features in both protein and regulatory sequences. Parasitoid wasps exhibit higher numbers of serpin protein/domains than nonparasitoids, resulting from more GD but less AS in parasitoids. The potential roles of AS and GD in the evolution of parasitoid host-effector genes are discussed. Furthermore, we find that PpSerpin-1 shows an exon expansion of AS compared to other parasitoids, and that several isoforms are involved in the wasp immune response, have been recruited to both wasp venom and larval saliva, and suppress host immunity. Overall, our study provides an example of how a parasitoid serpin gene adapts to parasitism through AS, and sheds light on the differential features of AS and GD in the evolution of insect serpins and their associations with the parasitic life strategy.


Introduction
Alternative splicing (AS) is a frequent regulatory process of transcription in animals, plants and fungi [1,2].Through differential inclusion/exculsion of exons and introns, AS produces multiple variant proteins from a single gene [1,2].AS and gene duplication (GD) are important sources of genetic innovation and protein diversity [3][4][5][6].Compared to GD, relatively little is known about the evolution of AS and its role in adaptation [3,7,8].Moreover, the relationship and difference between these two evolutionary processes, AS and GD, are largely understudied [4,9,10].
Here, we analyzed the evolutionary trajectory of PpSerpin-1 and compared the different features of AS and GD in serpin evolution and their associations with the parasitic life strategy.In addition, we demonstrate that PpSerpin-1 is involved in wasp's immune responses and has been recruited to both wasp venom and larval saliva, to regulate host immunity.

AS contributes to serpin protein diversity with GD and domain duplication in insects
Utilizing the accumulated transcriptomic data, we found that the PpSerpin-1 gene has alternative splicing, with two different N-terminal and 21 C-terminal AS forms (Figs 1A and S1).At

PLOS PATHOGENS
A parasitoid serpin gene exhibits adaptive alternative splicing To construct the phylogeny of the PpSerpin-1 gene, a homology search was performed against the insect Refseq_protein database in NCBI using the constitutive region of PpSerpin-1 (for details, see Materials and Methods), resulting in a total of 1,230 matched genes.Phylogenetic analyses showed that the C-terminal AS of serpin genes was clustered into one clade with 731 genes but was relatively rare in the outgroup (Fig 1C).The following analyses were all focused on this clade with enriched C-terminal AS.
Both N-terminal and C-terminal AS are widespread in insects (Fig 1C).Out of 731 genes in this clade, 261 genes (35.7%, not including PpSerpin-1) have either N-terminal or C-terminal AS.At the N-terminus, 153 out of 731 (21.0%) had N-terminal AS, with 115 out of 153 having both extracellular and intracellular forms.These genes with both extracellular and intracellular forms at the N-terminus had more C-terminal AS than genes with only one N-terminal form, either extracellular or intracellular (Fig 1D ; Mann-Whitney U test (MWUT); both comparisons: p < 0.001).At the C-terminus, 192 out of 731 (26.3%) genes had C-terminal AS, producing 1406 serpin proteins, with an average of 7.32 serpin proteins per gene.Notably, the number of C-terminal AS events showed rapid fluctuation, suggesting a high turnover rate of exon gain and loss (Fig 1C).
Together, AS, GD and domain duplication within genes contribute to serpin protein diversity.Forty-five genes have multiple serpin domains, producing 105 serpin domains with an average of 2.33 serpin domains per gene.The remaining 494 genes, which have neither C-terminal AS nor multiple serpin domains, likely originated by GD.Taken together, AS, GD and domain duplication accounted for 70.1%, 24.6% and 5.2% of the total number of 2005 serpin proteins/domains, respectively.Moreover, AS is negatively correlated with GD and domain duplication.The number of duplicated serpin gene copies in a species is inversely correlated with the percentage of C-terminal AS (Fig 1E ; Spearman correlation; ρ = -0.58,p = 4.0e -12 ) and with the means of C-terminal variants (Fig 1F; Spearman correlation; ρ = -0.59,p = 1.2e -12 ).Both correlations held after phylogenetic correction (both tests: p < 0.001).In addition, AS and domain duplication within genes are mutually exclusive, as no multiple-serpin gene contains C-terminal AS (Fig 1G ; MWUT; single-vs multipledomain serpin: W = 19755, p = 4.8e -5 ).

AS genes show divergent sequence features from non-AS genes
Both AS and GD are major sources of protein diversity.To compare their differences in gene sequence characteristics, we divided the single-domain serpins into genes with C-terminal AS (referred to as the AS gene set) and those without (referred to as the non-AS gene set).Here, we focused on C-terminal but not N-terminal AS, as N-terminal AS often changes protein localization but not the mature serpin protein sequence.For the protein sequence, we divided the serpin proteins into constitutive (present in all isoforms) and C-terminal alternative spliced regions based on the end of the hinge region (Fig 2B).Conservation scores were estimated based on alignments and then mapped to PpSerpin-1F as the reference (Fig 2C).In the constitutive region, the AS gene set was more conserved than the non-AS gene set (Fig 2C and 2D; Wilcoxon signed-rank test (WSRT); V = 985, p < 2.2e -16 ).Conversely, in the alternative region, the AS gene set was more variable than the non-AS gene set (Fig 2C and 2D; WSRT: V = 1125, p = 1.5e -05 ).To exclude the effect of reference sequence selection, we also used non-AS genes within the clade (Dmel_SPN55B, NP_524953.1)or in the outgroup (Dmel_SPN27A, NP_001260143.1)as reference sequences, and the conclusions held regardless of reference selection (all comparisons: p <0.001).
Furthermore, we compared the regulatory sequences of the AS and non-AS gene sets.For RNA-binding motifs, no motifs were enriched in the AS gene set compared to the non-AS gene set (p > 0.05), which may be due to the general short length of RNA-binding motifs and lack of statistical potency, as previously reported [44].For binding motifs of transcription factors, two motifs were enriched in the AS gene set compared to non-AS.They were M02712_2.00dl (dorsal) (Fig 2E; p = 0.010, enrichment ratio 8.01) and M03953_2.00Dif (Dorsal-related immunity factor) (p = 0.015, enrichment ratio 7.39).dl and Dif are both transcription factors involved in the Toll pathway [45].Compared to shuffled random sequences, dl and Dif binding motifs are significantly enriched in the AS gene set (dl: p = 3.33e -7 ; Dif: p = 7.70e -6 ) but not the non-AS gene set (p > 0.05).In addition, the probabilities of the presence of dl and Dif binding motifs are significantly correlated with the numbers of C-terminal AS (Fig 2E ; Spearman correlation; dl: ρ = 0.48, p < 2.2e -16 , dif: ρ = 0.48, p < 2.2e -16 ).These correlations are significantly higher than correlations using shuffled sequences, which have the same lengths as the true gene sequences (both comparisons: p < 0.0001).These results suggest that genes with more C-terminal AS are more likely to contain dl and Dif binding motifs, and this is not simply due to their longer sequence lengths.
We further asked how these dl and Dif binding motifs are distributed in sequences.We defined 6 region categories according to their strands (on coding or noncoding strands) and location (in promoter, exon or intron regions).Most of the best-hits were in the intron region of the coding strand for the AS gene set, while most of the best-hits were in the exon region of the noncoding strand for the non-AS gene set (Fig 2F).Best-hits of dl and Dif binding motifs are more likely to locate in the intron region of the coding strand of the AS gene set than that of the non-AS gene set (Fig 2F ; dl: χ 2 = 50.40,p < 0.00001; dif: χ 2 = 55.38,p < 0.00001) and shuffled control sequences of the AS gene set (dl: χ 2 = 10.8984,p = 0.000962; dif: χ 2 = 7.9206, p = 0.004888).In addition, when we restricted the motif search to one of the six region categories, the correlations between the probabilities of the presence of dl and Dif binding motifs and the numbers of C-terminal AS were significantly higher than correlations using shuffled sequences in the intron region of the coding strand but not in the other five region categories (dl: z = 2.7, p = 0.0065; dif: z = 4.4, p < 0.0001; all other comparisons: p > 0.05).Therefore, we conclude that dl and Dif binding motifs tend to be enriched in the intron region of the coding strand within the AS gene set.

Parasitoid wasps employ fewer AS in serpin, but PpSerpin-1 is an expansion of the AS exon number
We further investigated the potential relationship between AS and the parasitic life strategy.Higher numbers of total serpin proteins/domains per species were found in parasitoid wasps than in nonparasitoid hymenopterans ( Patterns are similar when comparing parasitoid wasps with all nonparasitoid insects (not limited to Hymenoptera) (all comparisons: p < 0.05).
However, PpSerpin-1 shows an expansion of AS number compared to other parasitoids.The C-terminal AS numbers of the PpSerpin-1 and Nasonia vitripennis LOC100122505 genes are increased compared to their homologous genes in the Chalcidoidea wasps, i.e., Trichogramma pretiosum, Ceratosolen solmsi marchali, and Copidosoma floridanum (Fig 3G).For PpSerpin-1 alternative exons, H and X2 clustered together with the XP_008201831.1-specificexon of the N. vitripennis LOC100122505 gene as the closest outgroup, suggesting PpSerpin-1specific exon expansion after divergence from the common ancestor with the N. vitripennis LOC100122505 gene, which is estimated to have occurred approximately 19 million years ago (MYA) [46].Sixteen out of 21 alternative exons show clear orthologous relationships with alternative exons of the N. vitripennis LOC100122505 gene, suggesting that most alternative exons of PpSerpin-1 existed prior to the divergence between Pteromalus and Nasonia.
We then estimated the pairwise substitution rates of orthologous exons between the PpSerpin-1 and N. vitripennis  -4 ).These results suggest positive selection diversifying protein sequences in alternative exons with lower substitution rates in synonymous sites, which can be important in the regulation of the AS process [1,2,47].

PpSerpin-1 is involved in the wasp's immune response and recruited into both wasp venom and larval saliva
Next, we investigated the function of isoforms of PpSerpin-1.First, isoform-specific RT-PCR confirmed the presence of all 21 C-terminal AS forms of PpSerpin-1 (Fig 4A).These isoforms were more likely to be upregulated by the gram-positive bacterium Micrococcus luteus and the fungus Beauveria bassiana than by PBS or the gram-negative bacterium Escherichia coli (Fig 4B ; WSRT; all comparisons: p < 0.001).
In our previous study, PpSerpin-1O was isolated from P. puparum venom [22].RNA-seq data showed that multiple PpSerpin-1 isoforms are expressed in the venom gland (Fig 4C).We also re-analyze the published P. puparum venom proteomic data [37] and identified isoforms B, G, O and P. By Western blotting, PpSerpin-1 and its homologous proteins were detected in the venoms of P. puparum and its relative species N. vitripennis, Muscidifurax raptor, and M. uniraptor (Fig 4D).Additionally, by examining published venom sequences, we also identified PpSerpin-1 homologous proteins in the venom of Trichomalopsis sarcophagae and Urolepis  rufipes, exhibiting 90.1% and 91.6% identity to the constitutive sequence of PpSerpin-1 protein, respectively [17,35].
In addition, both RT-PCR and transcriptomic results showed that some PpSerpin-1 isoforms were highly expressed in the larval stage (Fig 4A and 4C), particularly in the larval salivary gland (Fig 4C).We therefore hypothesize that some isoforms have been recruited into P. pupuarum larval saliva, which has been demonstrated to suppress host melanization [48].Consistent with this hypothesis, PpSerpin-1 proteins were detected by Western blot using PpSerpin-1 antibodies in PBS incubated with P. puparum larvae (S5
Among these protease candidates, PrPAP1 is P. rapae prophenoloxidase-activating proteinase 1 and is critical for hemolymph melanization.The presence of PrPAP1 in pull-down samples of PpSerpin-1A, O and P was confirmed by Western blotting using PrPAP1 antibodies (Fig 6A).In vitro inhibitory assays showed that isoforms A, O and P but not G significantly inhibited the activity of recombinant PrPAP1 (Fig 6B ; Dunnett's test, for A, O and P: p < 0.001; for G: p = 0.94).Consistent with this, isoforms A, O, and P but not G formed SDS-PAGE stable complexes with recombinant PrPAP1 (Figs 6C, 6D and S9).Presumably, two identified peptides of PrPAP1 in the pull-down sample of PpSerpin-1G may be leaks from the neighboring pull-down samples of PpSerpin1-A and O.The stoichiometries of inhibition (SIs) of PrPAP1 by PpSerpin1-A and P were 13.37 and 197.33, respectively (Fig 6E and 6F), which were larger than the previously reported SI of 2.3 by PpSerpin-1O [22].These results demonstrate that PpSerpin-1 isoforms A, O, and P suppress host melanization immunity by inhibiting PrPAP1.

Discussion
In this study, we report the divergent features of AS and GD in the evolution of insect serpins and their potential associations with the parasitic life strategy.We also found that PpSerpin-1 shows a number expansion of AS exons, which is involved in the wasp's immune response and recruited to both wasp venom and larval saliva to suppress host immunity.
Not all isoforms recruited into wasp venom or larval saliva inhibit host melanization.These isoforms may serve other functions, such as regulating the host's Toll pathway or modulating host metabolism.Proteomic identification results from the pull-down assays of these isoforms can provide hints into their targets and functions, which will be an interesting future direction.Moreover, several isoforms identified in larval saliva failed to form complexes with host hemolymph proteins.These PpSerpin-1 isoform proteins may target proteins out of the host hemolymph, e.g., in host hemocytes or gonad cells, inside host guts, or regulate proteases from wasp larval saliva.Alternatively, the inconsistency between proteomic identification and functions of PpSerpin-1 isoforms may be simply due to leaky expression from the noise of AS regulation [50].More investigations are needed to test these hypotheses.
In addition to AS, GD and domain duplication jointly contribute to the protein diversity of insect serpins.Among them, AS and GD are the major sources of protein diversity.We observed an inverse relationship between the production of AS variants and gene family size across different species within a serpin clade.This is similar to previous reports that the production of AS variants was inversely correlated with the size of the gene family within the same species [9,10,51].These results suggest negative regulatory mechanisms between AS and GD in the production of protein diversity.
AS is an efficient mechanism to generate significant protein diversity without requiring duplication and divergence of the entire gene [2,3,8].In serpin AS genes, C-terminal AS tends to occur at the end of the hinge region with an extra nucleotide G.The serpin hinge region is critical to the conformational change and thus the inhibitory function of serpin [30,31,33,34], while the extra nucleotide G may be important in the splicing process of AS.The splicing sites in AS serpins reflect the maximum reuse of sequence.Compared to non-AS genes, AS genes are more conserved in the constitutive region but more variable in the alternative region.In addition, in the comparison of PpSerpin-1 with its orthologous gene LOC100122505 in N. vitripennis, the alternative exons show significantly higher nonsynonymous substitution rates than the constitutive exons.Collectively, these results imply that AS allows for distinct regions to experience divergent evolutionary pressures, enabling rapid protein evolution in serpin RCL while preserving most of the functional backbone under strong conservation.
On the other hand, the regulation of AS is often complicated [52].For pairwise substitution rates between PpSerpin-1 and its orthologous gene LOC100122505 in N. vitripennis, alternative exons show significantly lower synonymous substitution rates than constitutive exons, suggesting that synonymous sites may be critical in the regulation of AS and thus under higher selective pressure in alternative exons [1,2,47].In addition, the binding motifs of two transcription factors, dl and Dif, are enriched in the AS gene set compared to the non-AS gene set, particularly preferring the intron region of the coding strand of AS genes.Genes with more C-terminal AS are more likely to contain dl and Dif binding motifs.Accumulated evidence suggests that transcription factors can be directly or indirectly involved in the regulation of AS, although the mechanisms are not fully understood [53,54].As pre-mRNA splicing mainly takes place during transcription, extensive crosstalk has been reported between these two processes [53].One possible explanation could be that some expanded serpin isoforms are involved in the Toll pathway.The expression of these isoforms might be regulated by Toll-pathway-related dl and Dif, probably through binding to the intron regions of serpin gene pre-mRNA.
To manipulate their hosts, parasitoid wasps often recruit effector genes, e.g., venom or larval salivary genes [14,15].Serpin, a common component of parasitoid venom [17,[20][21][22][23][35][36][37][38][39][40][41][42][43], has also been reported in teratocytes of parasitoid wasps for host regulation [55,56].Here, we further extend serpin recruitment to wasp larval saliva.A previous study also reported that P. puparum larval saliva inhibits host immunity [48].In light of this, the higher numbers of serpin protein/domains in parasitoid wasps may be related to the recruitment of host effector genes.Another example of an AS venom gene is LOC122512947 from Leptopilina heterotom with 2 N-terminal and 20 C-terminal AS isoforms [36].In addition, Leptopilina boulardi LbSPNy (ACQ83466.1)was reported as a venom non-AS gene that inhibits host melanization [43].LbSPNy forms a Leptopilina-specific non-AS gene cluster with LOC122509667, 122505241, 12200434, 122502279, 122503460, and 122502269 of L. heterotoma, suggesting that LbSPNy likely originated from GD. Extensive GD of the serpin gene was also reported in the venom of M. mediator [21].Our findings further show that the increased number of serpins in parasitoids results from more GD but less AS.This is consistent with the general observation that venom genes in parasitoid wasps and other venomous animals are more likely to originate from GD than AS [14,57].
In PpSerpin-1, we observed a number expansion of alternative exons compared to other parasitoid serpin genes.Reconstructing the accurate evolutionary history of these exons is challenging, particularly for ancient expanded exons, due to their accelerated protein evolution and short sequences (44~50 amino acids for PpSerpin-1 alternative regions).Phylogenetic tree analysis revealed that alternative exons of PpSerpin-1H and X2 form a gene-specific cluster, suggesting a recent Pteromalus-specific exon duplication.However, the majority (16 out of 21) of PpSerpin-1 alternative exons show clear orthologous relationships with those of the N. vitripennis LOC100122505 gene.One possible explanation is that these exons underwent lineagespecific exon duplications before the divergence of Pteromalus and Nasonia (~19 MYA) [46] and were retained in both species, probably serving conserved and essential functions.The expanded alternative exons in PpSerpin-1 may contribute to its ecological adaptation.Consistent with this hypothesis, we found that several isoforms of PpSerpin-1 are involved in the wasp's immune response, have been recruited to both wasp venom and larval saliva, and suppress host immunity.
However, no obvious expressional specialization, especially to venom glands, was observed for PpSerpin-1 isoforms.We speculate that accurate expressional regulation is an obstacle to the recruitment of host effector genes by AS, particularly that PpSerpin-1 is likely involved in the regulation of wasp self-immunity.Several PpSerpin-1 isoforms can be upregulated by M. luteus and B. bassiana, suggesting that PpSerpin-1 may be involved in the wasp's Toll pathway, which is preferentially activated by gram-positive bacteria and fungi in Drosophila [45].In AS genes, differential expressional regulation between self and host manipulation functions may be difficult to evolve.On the other hand, expressional divergence of duplicated genes often occurs after location segregation of gene copies [58], presumably due to changes in cis-regulatory regions.The positional linkage of alternative exons makes the expressional regulation of AS more challenging and requires more complicated regulatory mechanisms.In particular, high expressional specialization should be required to avoid self-harm for toxic virulent genes.This may explain why AS is less employed than GD in the recruitment of parasitoid host effector genes.
In summary, we show that both AS and GD contribute to the evolution of insect serpin with differential features.We report that a parasitoid serpin gene has evolved through exon number expansion of AS and show its involvement in wasp's immunity and recruitment into the wasp's venom and larval saliva to manipulate host immunity.

Insect rearing
Laboratory cultures of P. puparum, N. vitripennis, T. sarcophagae, M. raptor and M. uniraptor were maintained in Drosophila tubes at 25˚C with a photoperiod of 14:10 h (light:dark) as previously described [17,37].Pupae of Pieris rapae were used as hosts for P. puparum, and pupae of Sarchophaga bullata were used as hosts for N. vitripennis, T. sarcophagae, M. raptor and M. uniraptor.Once emerged, P. puparum adult wasps were fed ad libitum with 20% (v/v) honey solution to lengthen their life span.

Protein structure prediction
The protein structure of PpSerpin-1F was modeled using AlphaFold2 with default settings [62,63].No structural template was used.All five AlphaFold2 models were tested.Model 4 was selected as the best prediction based on its highest pLDDT score of 88.7.Alignemnts for protein modeling were generated through MMSeqs2 [64] against the UniRef+Environmental database.

Sequence fetch and feature analyses
BLASTP [65] was performed using the constitutive protein sequence of the PpSerpin-1 gene against the Refseq_protein database on NCBI (accessed at Dec 2021).The organism was limited to "Insecta (taxid:50557)" with a maximum target sequence of 5000 and an e-value of 1e -5 .One species per genus was selected as a representative to reduce sampling bias.GenBank files were retrieved for each gene using EFetch v0.2.2 (https://dataguide.nlm.nih.gov/edirect/efetch. html).Isoform sequences and splicing positions were extracted from GenBank files for each gene using homemade python scripts.For each gene, the number of N-terminal AS events was approximately estimated by counting the number of different sequences in the first 20 aa.Similarly, the number of C-terminal AS events was approximately estimated by counting the number of different sequences in the last 50 aa.Signal peptides were predicted using SignalP v5.0 [66].Serpin domains were annotated using the NCBI Batch Web CD-Search Tool (https:// www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi) against the "CDD 58235 PSSMs" database [67] (accessed at Jan, 2022).

Phylogenetic analyses
For gene phylogeny, the longest protein isoform was selected for each gene.Proteins were aligned using Mafft v7.310 [68].The gene tree was constructed using IQTree v2.2.0 with an ultra bootstrap of 1000 [69].The best-fit model (LG+R10) was automatically selected by builtin ModelFinder [70] in IQTree.For exon phylogeny, annotated serpin domains were extracted with a C-terminal extending 30 aa or to the end of the sequences if less than 30 aa.For genes with C-terminal AS, only nonredundant C-terminal sequences were included.Serpin domain sequences were aligned using Mafft v7.310 [68].The tree was constructed using IQTree v2.2.0 [69].To reduce sampling bias, one species per genus was selected as a representative for visualization and subsequent statistical analyses.Ancestral character estimation was performed for the C-terminal AS number of each internal node using the fastAnc function in the R package "phytools" v1.2.0 [71].Trees were pruned by Newick Utilities v1.6 [72] and visualized on iTOL [73].

Expansion level analyses
For every pair of genes from the same species in the gene phylogeny, two genes are defined as species-specific expansion if all genes in the clade of the common ancestor of these two genes belong to the same species.If all genes in the clade of the common ancestor of these two genes from the same species belong to the same family, two genes are defined as family-specific expansion, and so on.Similarly, for every two alternative C-terminal exons from the same gene in the exon phylogeny, two exons are defined as gene-specific expansion if all exons in the clade of the common ancestor of these two exons belong to the same gene, and so on.Expansion levels were determined by homemade R scripts.

WebLogo
Consensus sequence logos were created using WebLogo v3.7.9 [74].For protein alignments, conservation scores were estimated for each site using WebLogo v3.7.9 [74].Conservation scores of the corresponding positions in the reference sequence were extracted for comparisons.For the constitutive region of AS genes, the longest protein isoforms of each gene were selected as representatives.For the C-terminal alternative region of AS genes, all nonredundant sequences (based on the last 50 aa at the C-terminus) were used.Conservation scores were mapped to the PpSerpin-1F structure using PyMOL v2.5.0 (https://pymol.org).

dN and dS estimation
For each orthologous exon pair between PpSerpin-1 and LOC100122505 (the ortholog of PpSerpin-1 in N. vitripennis), protein sequences were aligned using Mafft v7.310 [68] and then reverse translated to codons using PAL2NAL v14 [75].If a codon crossed the boundary of two exons, the entire codon was counted in the exon that contained more nucleotide bases of that codon.Pairwise dN and dS values were estimated using PAML v4.9j [76].

Motif scanning and enrichment analysis
Motif enrichment analyses were performed using SEA (Simple Enrichment Analysis) in MEME v5.5.0 [77,78].Gene sequences with C-terminal AS were set as primary sequences, and shuffled sequences or gene sequences without C-terminal AS were set as control sequences.Drosophila melanogaster "CIS-BP 2.00 single species DNA" or "CISBP-RNA single species RNA" was set as the motif database.Fisher's exact tests were performed if the primary and control sequences had the same average length (within 0.01%); otherwise, binomial tests were performed [78].Sequences of pomotor, exon and intron regions were extracted from GenBank files.Promoter regions were approximated by using the upstream sequences of the translation start sites.Exon regions were approximated using coding sequences by masking other regions using "N".For binding motifs of dl (M02712_2.00)and Dif (M03953_2.00),hits were detected with a p value threshold of 1 using FIMO in MEME [77].Shuffled sequences were created by fasta-shuffle-letters in MEME [77].

Specific RT-PCR
Ptermoalus puparum embryos (<10 hr after parasitism), larvae (combined 2-3 day larvae after parasitism), yellow pupae (mixed with female and male pupae), adult females and males (combined 1-5 day adults after emergence) were collected and rinsed with PBS.Total RNA was extracted using TRIzol reagent (Invitrogen, USA) according to the manufacturer's protocol and then reverse transcribed using TransScript One-step gDNA Removal and cDNA Synthesis Super-Mix (TransGen, Beijing, China) with random primers.Isoform-specific primers were designed to span constitutive exon 7 and alternative exon 8 using PerlPrimer v1.1.21[79] and are listed in S1 Table .PCR was performed using TransTaq HiFi DNA Polymerase (TransGen, Beijing, China) with 20-35 cycles of amplification.PCR products were analyzed by electrophoresis on a 1% (g/ mL) agarose gel and confirmed by Sanger sequencing (Sangon Biotech, Shanghai, China).

Microbial stimulus of P. puparum
Freshly cultured M. luteus, E. coli, and B. bassiana were harvested by centrifugation and subsequently washed with sterile PBS three times.These microbes were suspended in PBS at a density of 5×10 7 cells/mL.Two-day old P. puparum female adults were selected and anesthetized using carbon dioxide.A sterilized acupuncture needle was immersed in the suspension containing either bacteria or fungi and then inserted into the inter-segmental space of the parasitoid wasp's abdomen.Wasps that were pricked with sterile PBS were used as a control group.Female wasps were then collected at 6, 24, and 48 h after the microbial stimulus.

Wasp larval saliva collection
Saliva of wasp larvae was collected as previously described with minor modifications [48].Briefly, wasp larvae were collected by opening host pupae 3 days after parasitism.Larvae were rinsed with PBS and electrically stimulated using an acupuncture device (Hwato, China) with a frequency of 50 Hz and a current of 2.5 mA.Secreted saliva drops at larval mothparts were transferred into PBS using pipette tips.The protein concentration was determined using a Pierce BCA Protein Assay Kit (Thermo Scientific, USA).

LC-MS/MS
Fifty micrograms of protein from wasp larval saliva was digested by trypsin using the filteraided sample preparation (FASP) method.After desalting by a C18 cartridge, the digestion product was lyophilized and redissolved in 40 μl of 0.1% formic acid solution.LC-MS/MS was conducted on an Easy nLC HPLC system (Thermo Scientific, USA) with a flow rate of 300 nl/ min followed by Q-Exactive (Thermo Finnigan, USA).The sample was loaded on a Thermo Scientific EASY column (5 μm, 2 cm × 100 μm, C18) and then separated on another Thermo Scientific EASY column (3 μm, 75 μm × 100 mm, C18).Buffer A was water with 0.1% formic acid, and buffer B was 84% acetonitrile with 0.1% formic acid.The columns were first equilibrated with 95% buffer A, then from 0% to 35% buffer B in 50 min, from 35% to 100% buffer B in 5 min, and finally 100% buffer B for 5 min.The charge-to-mass ratios of peptides and fractions of peptides were collected 20 times after every full scan.The resulting MS/MS spectra were searched against the P. puparum genome database using Mascot 2.2 in Proteome Discoverver."Carbamidomethyl (C)" was set as a fixed modification."Oxidation (M)" and "Acetyl (Protein N-term)" were set as variable modifications.The maximum number of missed cleavages was set as 2. FDR � 0.01 was set to filter the protein identification.The same software and parameters were used for the reanalysis of P. puparum venom proteomic data [37].This part was conducted by Shanghai Applied Protein Technology Co., Ltd.(Shanghai, China).

Western blot
Protein was separated by electrophoresis on 12% SDS-PAGE and transferred to a PVDF (polyvinylidene difluoride) membrane at 100 mA for 2 h using a Mini-PROTEAN Tetra system (Bio-Rad, USA).For detection of PpSerpin-1 isoform proteins or PrPAP1, antibodies against PrPAP1 and PpSerpin-1O [22] (diluted 1:1000) were used as primary antibodies, followed by HRP (horseradish peroxidase)-conjugated goat anti-rabbit IgG antibody (Sigma Aldrich, Germany; diluted 1:5000) as the secondary antibody.For detection of His-tagged proteins, THE His Tag mouse antibody (GenScript, Nanjin, China; diluted 1: 2000) was used as the primary antibody, followed by goat anti-mouse IgG antibody-HRP conjugate (Sigma Aldrich, Germany; diluted 1: 5000) as the secondary antibody.The membranes were detected using Pierce ECL Western Blotting Substrate ECL (Thermo Fisher, USA) and imaged by the Chemi Doc-It 600 Imaging System (UVP, Cambridge, UK).

Recombinant protein expression and purification
For recombinant expression of PpSerpin-1 isoforms, constitutive fragments without signal peptides and alternative fragments were separately amplified using TransTaq HiFi DNA Polymerase (TransGen, Beijing, China) and cloned into the pET-28a vector using the ClonExpress MultiS One Step Cloning Kit (Vazyme, Nanjing, China).Primers were designed using PerlPrimer V1.1.21and are listed in S2 Table .The linear vector pET-28a was generated by digestion with BamHI and XhoI (TaKaRa, Dalian, China).Recombinant plasmids were then transfected into E. coli BL21(DE3) Chemically Competent Cell (TransGen Biotech, Beijing, China) and confirmed by Sanger sequencing (Sangon Biotech, Shanghai, China).E. coli cells were grown in autoinduction medium [80] containing 100 μg/μl kanamycin at 300 rpm and 20˚C for 48 h and then harvested by centrifugation at 12000 × g for 20 min.Recombinant protein was extracted using BugBuster Protein Extraction Reagent (Thermo Scientific, USA) and purified using cOmplete His-Tag Purification Resin (Roche, Switzerland) and His-Bind Purification kit (Novagen, USA) according to the manufacturer's protocol.The concentration of purified protein was determined using a Pierce BCA Protein Assay Kit (Thermo Scientific, USA).

Phenoloxidase activity assay
Plasma was harvested by cutting the hind legs of P. rapae larvae with scissors and diluted four times into ice-cold TBS buffer (20 mM Tris, 150 mM NaCl, pH = 7.6).Cell-free hemolymph was obtained by centrifugation at 4˚C and 3000 × g for 10 min to remove hemocytes.To screen for the inhibitory activities of PpSerpin-1 isoforms on host melanization, 5 μl of recombinant protein (0.2 μg/μl) was mixed with 10 μl of diluted P. rapae hemolymph in a 384-well plate.For each sample, 5 μl of elicitor (0.1 μg/μl M. luteus) and 5 μl of substrate solution (50 mM L-Dopa in PBS, pH = 7.5) were first mixed and added to another 384-well plate, which was fixed upside down on the sample plate.By centrifuging these two oppositely fixed 384-well plates, the PPO (prophenoloxidase) cascade was activated in each well simultaneously.Plates were measured at A470 and 25˚C every 5 min for 2 h using a Varioskan Flash multimode reader (Thermo Scientific, USA).For dose-dependent assays, 5 μl of recombinant protein was mixed with 15 μl of diluted P. rapae hemolymph and 5 μl of elicitor (0.1 μg/μl M. luteus).After incubation at 25˚C for ~10 min, 800 μl of substrate solution (20 mM Dopa in PBS, pH = 6.5) was added.Samples (200 μl) were monitored at A470 in 96-well plates for 20 min using a Varioskan Flash multimode reader (Thermo Scientific, USA).One unit of PO activity was defined as 0.001 4A470/min.

Pull-down assay
Recombinant protein of PpSerpin-1 isoforms (10 μg) was mixed with 1 ml diluted P. rapae hemolymph, 50 μl saturated PTU and 100 μl M. luteus (1 μg/μl) and incubated on a rotator overnight at 4˚C.After centrifugation at 12000 g and 4˚C for 20 min, the supernatant was further incubated with 25 μl of cOmplete His-tag purification resins (Roche, Switzerland) at 4˚C for 2 h.After washing three times with 300 μl of washing buffer (1 M NaCl, 120 mM imidazole, 40 mM Tris-HCl, pH 7.9), proteins were eluted with 50 μl of elution buffer (1 M imidazole, 0.5 M NaCl, 20 mM Tris-HCl, pH 7.9) and subjected to SDS-PAGE followed by Lumitein Protein Gel Stain (Biotium, Hayward, CA, USA) and immunoblotting.To identify potential targets in pull-down complexes, gels were cut between 60 and 100 kDa.Gel slices were then in-gel digested by trypsin at 37˚C for 20 h.After desalting and lyophilization, the enzymatic product was redissolved in 0.1% formic acid solution and subjected to LC-MS/MS.The parameters used were the same as for the above, except specifically mentioned.The gradient was 1 h.Annotated proteins from Pieris rapae genome assembly GCF_001856805.1 were set as the search database."Oxidation (M)" was set as a variable modification.Protein identification was filtered by proteins with at least two peptides identified.Gel digestion and proteomic identification were conducted by Shanghai Applied Protein Technology Co., Ltd.(Shanghai, China).

Statistics
All statistical analyses were performed in R v4.1.2.Differences between Spearman correlations were tested using the Hittner2003 method in the R package "cocor" v1.1.4[81].To avoid biases that may arise from data non-independence due to shared evolutionary history, phylogenetic correction was performed using the independent contrasts method.Phylogenetic correction allows us to account for the evolutionary relationships among species and obtain more accurate estimates of trait correlations.For independent contrasts, the phylogenetic species tree was generated by phyloT v2 (https://phylot.biobyte.de/).The branch length was estimated by compute.brlen in the R package "ape" v5.6.2 [82] using Grafen's (1989) methods [83].Independent contrasts were conducted using the "pic" function in the R package "ape" [82].Correlations through origins were estimated for independent contrasts using the R package "picante" v1.8.2 [84].

Fig 1 .
Fig 1. Alternative splicing (AS), gene duplication (GD) and domain duplication contribute to serpin protein diversity in insects.(a) Gene structure of PpSerpin-1.Gray indicates constitutive exons.Blue and red indicate N-terminal and C-terminal alternative exons, respectively.(b) Protein structure of PpSerpin-1F predicted by AlphaFold2.PpSerpin-1F is the longest protein isoform of the PpSerpin-1 gene and was chosen as the representative.The gradient colors from blue to red indicate different exons.The bracket indicates the signal peptide.Black arrows indicate alternative translation start sites.The dashed circle indicates the hinge region of PpSerpin-1F.(c) Phylogenetic tree of serpin in insects.The different colors on the labels indicate different insect orders.Black arrows indicate parasitoid wasps.The red and blue dots on branch terminals indicate the presence (extracellular) and absence (intracellular) of the signal peptide, respectively.Black dots on branch terminals indicate N-terminal AS with both extracellular and intracellular isoforms.The red bars indicate the log2 transformed numbers of C-terminal AS events.The gray bars indicate the log2 transformed numbers of serpin domains within a gene.(d) Relationship between the number of C-terminal AS events and gene localization.(e) Relationship between the percentage of C-terminal alternative spliced genes and serpin gene copy number.(f) Relationship between the number of C-terminal AS events and serpin gene copy number.(g) Relationship between the number of C-terminal AS events and sperin domain number.*** p < 0.001; * p < 0.05.https://doi.org/10.1371/journal.ppat.1011649.g001For the AS gene set, the splicing positions of C-terminal AS occurred mainly near the end of the hinge region (GSEAAAVT in PpSerpin-1) (Fig 2A).Considering the last nucleotide at the end of the hinge region as position 0, 161 out of 192 (83.9%)AS genes spliced at position +1 and 19 (9.9%) at position +4.The AS gene set was more likely to splice at position +1 than the non-AS gene set (Fig 2A; χ 2 = 35.81,p = 0.00001).Both AS and non-AS gene sets have G| GTAAGT and TTNCAG|N sequence motifs near C-terminal splicing sites (Fig 2B).Position 0 (at codon position 3), +5 (at codon position 2) and +11 (at codon position 2) are more inclined to be G, T and T in the AS gene set than in the non-AS gene set, respectively (Fig 2B; p < 0.05).

Fig 3 .
Fig 3. Parasitoid wasps employ fewer AS events in serpin, but PpSerpin-1 shows an expansion of the C-terminal AS number.(a-f) Comparison between parasitoid wasps and nonparasitoid hymenopterans of (a) total serpin protein per species, (b) gene copy number, (c) percentage of genes with C-terminal AS, (d) average C-terminal AS number, (e) distribution of gene expansion and (f) distribution of exon expansion.(g) Expansion of the C-terminal AS number in PpSerpin-1.Black arrows indicate parasitoid wasps.The red and blue dots on branch terminals indicate the presence (extracellular) and absence (intracellular) of the signal peptide, respectively.Black dots on branch terminals indicate N-terminal AS with both extracellular and intracellular isoforms.The gradient color on branches indicates the estimated ancestral states of C-terminal AS numbers.The red bars indicate the log2 transformed numbers of C-terminal AS events.The gray bars indicate the log2 transformed numbers of serpin domains within a gene.*** p < 0.001, ** p < 0.01.https://doi.org/10.1371/journal.ppat.1011649.g003

Fig 4 .
Fig 4. PpSerpin-1 isoforms are involved in the wasp's immune response and parasitism.(a) Isoform-specific RT-PCR of PpSerpin-1 isoforms.(b) Expression of PpSerpin-1 isoforms in response to injection of PBS, E. coli, M. luteus or B. bassiana.Expression levels were calculated from RNA-seq data.(c) Expression of PpSerpin-1 isoforms in different developmental stages and tissues.Black dots indicate proteomic identification in P. puparum larval saliva.Black stars indicate proteomic identification in venom of P. puparum female wasps.(d) Western blot detection in venom of P.
Fig 5A; Dunnett's test; p = 0.0038), and this inhibitory effect could be eliminated by the antibody of PpSerpin-1 (Fig 5A; Dunnett's test; p > 0.05).To further investigate the function of PpSerpin-1 in suppressing host melanization, recombinant proteins of each isoform were produced (S6 Fig).PpSerpin-1 isoforms A, G, O and P significantly inhibited hemolymph melanization of P. rapae (Fig 5B; Dunnett's test; all comparisons: p < 0.001) in a dose-dependent manner (S7 Fig) and formed complexes with host hemolymph proteins in pull-down assays (Fig 5C).Pull-down assays were also performed for the remaining isoforms, which were identified in wasp venom or larval saliva.PpSerpin-1B and H formed complexes with host hemolymph proteins, although they showed no inhibition of host melanization (Fig 5C).